AITopics | mrc dataset

Collaborating Authors

mrc dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese

Nguyen-Phung, Hai-Chung, Lê, Ngoc C., Nguyen, Van-Chien, Nguyen, Hang Thi, Nguyen, Thuy Phuong Thi

arXiv.org Artificial IntelligenceJun-17-2025

After two years of appearance, COVID-19 has negatively affected people and normal life around the world. As in May 2022, there are more than 522 million cases and six million deaths worldwide (including nearly ten million cases and over forty-three thousand deaths in Vietnam). Economy and society are both severely affected. The variant of COVID-19, Omicron, has broken disease prevention measures of countries and rapidly increased number of infections. Resources overloading in treatment and epidemics prevention is happening all over the world. It can be seen that, application of artificial intelligence (AI) to support people at this time is extremely necessary. There have been many studies applying AI to prevent COVID-19 which are extremely useful, and studies on machine reading comprehension (MRC) are also in it. Realizing that, we created the first MRC dataset about COVID-19 for Vietnamese: ViQA-COVID and can be used to build models and systems, contributing to disease prevention. Besides, ViQA-COVID is also the first multi-span extraction MRC dataset for Vietnamese, we hope that it can contribute to promoting MRC studies in Vietnamese and multilingual.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2504.21017

Country:

Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets

Foolad, Shima, Kiani, Kourosh, Rastgoo, Razieh

arXiv.org Artificial IntelligenceAug-4-2024

This paper provides a thorough examination of recent developments in the field of multi-choice Machine Reading Comprehension (MRC). Focused on benchmark datasets, methodologies, challenges, and future trajectories, our goal is to offer researchers a comprehensive overview of the current landscape in multi-choice MRC. The analysis delves into 30 existing cloze-style and multiple-choice MRC benchmark datasets, employing a refined classification method based on attributes such as corpus style, domain, complexity, context style, question style, and answer style. This classification system enhances our understanding of each dataset's diverse attributes and categorizes them based on their complexity. Furthermore, the paper categorizes recent methodologies into Fine-tuned and Prompt-tuned methods. Fine-tuned methods involve adapting pre-trained language models (PLMs) to a specific task through retraining on domain-specific datasets, while prompt-tuned methods use prompts to guide PLM response generation, presenting potential applications in zero-shot or few-shot learning scenarios. By contributing to ongoing discussions, inspiring future research directions, and fostering innovations, this paper aims to propel multi-choice MRC towards new frontiers of achievement.

comprehension, dataset, mrc dataset, (13 more...)

arXiv.org Artificial Intelligence

2408.02114

Country:

Europe > Norway (0.14)
Europe > Denmark (0.14)
North America > Puerto Rico (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Education > Assessment & Standards > Student Performance (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering

Cui, Chenhao, Jiang, Yufan, Wu, Shuangzhi, Li, Zhoujun

arXiv.org Artificial IntelligenceApr-27-2024

Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of passage, question and answers. It is non-trivial but ignored to transfer knowledge from other MRC tasks such as SQuAD due to task specific of MMRC.In this paper, we reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct. Then select the option with the highest confidence score as the final answer. Our proposed method gets rid of the multi-choice framework and can leverage resources of other tasks. We construct our model based on the ALBERT-xxlarge model and evaluate it on the RACE and DREAM datasets. Experimental results show that our model performs better than multi-choice methods. In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves state-of-the-art results in both single and ensemble settings.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2404.17949

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.04)
North America > United States > New York (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education > Assessment & Standards > Student Performance (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.41)

Add feedback

A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics

Mohammadi, Azade, Ramezani, Reza, Baraani, Ahmad

arXiv.org Artificial IntelligenceDec-7-2022

Abstract: Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed. Keywords: Multi-hop Machine Reading Comprehension, Multi-hop Machine Reading Comprehension Dataset, Natural Language Processing, 1-INTRODUCTION Machine reading comprehension (MRC) is one of the most important and long-standing topics in Natural Language Processing (NLP). MRC provides a way to evaluate an NLP system's capability for natural language understanding. An MRC task, in brief, refers to the ability of a computer to read and understand natural language context and then find the answer to questions about that context. The emergence of large-scale single-document MRC datasets, such as SQuAD (Rajpurkar et al., 2016), CNN/Daily mail (Hermann et al., 2015), has led to increased attention to this topic and different models have been proposed to address the MRC problem, such as (D. However, for many of these datasets, it has been found that models don't need to comprehend and reason to answer a question. For example, Khashabi et al (Khashabi et al., 2016) proved that adversarial perturbation in candidate answers has a negative effect on the performance of the QA systems. Similarly, (Jia & Liang, 2017) showed that adding an adversarial sentence to the SQuAD (Rajpurkar et al., 2016) context will drop the result of many existing models.

artificial intelligence, dataset, natural language, (13 more...)

arXiv.org Artificial Intelligence

2212.0407

Country:

Africa > Namibia (0.14)
Indian Ocean > Arabian Sea (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
(12 more...)

Genre: Overview (1.00)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

Putri, Rifki Afina, Oh, Alice

arXiv.org Artificial IntelligenceOct-25-2022

Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020). However, most MRC datasets only have answerable question type, overlooking the importance of unanswerable questions. MRC models trained only on answerable questions will select the span that is most likely to be the answer, even when the answer does not actually exist in the given passage (Rajpurkar et al., 2018). This problem especially remains in medium- to low-resource languages like Indonesian. Existing Indonesian MRC datasets (Purwarianti et al., 2007; Clark et al., 2020) are still inadequate because of the small size and limited question types, i.e., they only cover answerable questions. To fill this gap, we build a new Indonesian MRC dataset called I(n)don'tKnow- MRC (IDK-MRC) by combining the automatic and manual unanswerable question generation to minimize the cost of manual dataset construction while maintaining the dataset quality. Combined with the existing answerable questions, IDK-MRC consists of more than 10K questions in total. Our analysis shows that our dataset significantly improves the performance of Indonesian MRC models, showing a large improvement for unanswerable questions.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2210.13778

Country:

South America > Colombia (0.14)
South America > Chile (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(14 more...)

Genre: Research Report (0.82)

Industry:

Education > Assessment & Standards > Student Performance (0.61)
Leisure & Entertainment (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PQuAD: A Persian Question Answering Dataset

Darvishi, Kasra, Shahbodagh, Newsha, Abbasiantaeb, Zahra, Momtazi, Saeedeh

arXiv.org Artificial IntelligenceFeb-13-2022

It includes 80,000 questions along with their answers, with 25% of the questions being adversarially unanswerable. We examine various properties of the dataset to show the diversity and the level of its difficulty as a MRC benchmark. By releasing this dataset, we aim to ease research on Persian reading comprehension and development of persian question answering systems. Our experiments on different state-of-the-art pre-trained contextualized language models shows 74.8% Exact Match (EM) and 87.6% F1-score that can be used as the baseline results for further research on Persian QA.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2023.101486

2202.06219

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)
Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

Zhang, Taolin, Wang, Chengyu, Qiu, Minghui, Yang, Bite, He, Xiaofeng, Huang, Jun

arXiv.org Artificial IntelligenceAug-24-2020

Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2008.10327

Country: Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (0.93)
Education > Assessment & Standards > Student Performance (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets

Zeng, Chengchang, Li, Shaobo, Li, Qin, Hu, Jie, Hu, Jianjun

arXiv.org Artificial IntelligenceJun-21-2020

Machine Reading Comprehension (MRC) is a challenging NLP research field with wide real world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed the human performance on many datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need of improving existing datasets, evaluation metrics and models to move the MRC models toward 'real' understanding. To address this lack of comprehensive survey of existing MRC tasks, evaluation metrics and datasets, herein, (1) we analyzed 57 MRC tasks and datasets; proposed a more precise classification method of MRC tasks with 4 different attributes (2) we summarized 9 evaluation metrics of MRC tasks and (3) 7 attributes and 10 characteristics of MRC datasets; (4) We also discussed some open issues in MRC research and highlight some future research directions. In addition, to help the community, we have collected, organized, and published our data on a companion website(https://mrc-datasets.github.io/) where MRC researchers could directly access each MRC dataset, papers, baseline projects and browse the leaderboard.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2006.1188

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(27 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting (0.92)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

Stavropoulos, Petros, Pappas, Dimitris, Androutsopoulos, Ion, McDonald, Ryan

arXiv.org Machine LearningMay-13-2020

We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset, and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2005.06376

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Greece (0.04)
(9 more...)

Genre: Research Report > New Finding (0.94)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Filters

Collaborating Authors

mrc dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese

Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets

Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering

A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

PQuAD: A Persian Question Answering Dataset

More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering

Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets

BIOMRC: A Dataset for Biomedical Machine Reading Comprehension